Network analysis is all about connections (surprise). What sets network analytic approaches apart form ‘standard’ quantitative research is really the inherent inter-dependence of the data. Meaning, observations only exist BECAUSE they are connected. For example, multi-national trade relations require at least two countries to trade goods. Otherwise we can not observe the phenomenon.
In network terms, each country in the trade example would be called a node (or vertex, pl. vertices) and the trade through which the are connected would be an edge (or tie). Nodes and edges can have additional features, as we will see below.
In network analysis, researchers can be interested in the position of specific nodes or in the overall network structure. There are several quantitative measures you can calculate for this, some of which are introduced below. The network measures can then be both explanatory or dependent variables, depending on your research question.
There are several packages for network analysis in R:
This tutorial will only feature tidygraph, ggraph, and visNetwork. See the Resources section for more great material, also for the other packages.
That networks consist of two types of data (nodes and edges) is also visible in network objects in R. In tidygraph, these objects are named tbl_graph. Let’s construct one of those with some example node and edge data.
node_list <- tibble(id = c(1:5))
edge_list <- tibble(from = c(1,1,1,2,3,3,3,4,5,5,5), to = c(2,2,3,4,2,4,5,5,2,2,2)) %>%
group_by(from, to) %>%
summarise(weight = n())
undir_net <- tbl_graph(nodes = node_list, edges = edge_list, directed = FALSE, node_key = "id")
undir_net## # A tbl_graph: 5 nodes and 8 edges
## #
## # An undirected simple graph with 1 component
## #
## # Node Data: 5 x 1 (active)
## id
## <int>
## 1 1
## 2 2
## 3 3
## 4 4
## 5 5
## #
## # Edge Data: 8 x 3
## from to weight
## <int> <int> <int>
## 1 1 2 2
## 2 1 3 1
## 3 2 4 1
## # ... with 5 more rows
The minimum content of node data is an identifier variable (here id). The identifier has to appear in the edge data as well. There it indicates which nodes are connected to one another (here from and to). I already added a weight to the edges. This is simply a count of how many connections are between a pair of nodes. The resulting graph is undirected (directed = FALSE), which can be translated as a ‘mutual relationship’.
We can now plot this object with ggraph, which basically works like ggplot2 with some network specific features. The layout option defines the algorithm that is responsible for how the nodes and edges are positioned to one another. The layouts available are the same as in the igraph function layout_with_*. ‘kk’ stands for Kamada-Kawai, a very common layout algorithm that distributes nodes equally in space.
pal <- Manu::get_pal("Kereru") # define color palette for plotting (https://github.com/G-Thomson/Manu)
# undirected, unweighted, no attributes
ggraph(undir_net, layout = "kk") +
geom_edge_link() +
geom_node_point(size = 10, color = pal[1]) +
geom_node_text(aes(label = id),
colour = "white", vjust = 0.4) +
theme_graph() # theme without axes, gridlines etc.Undirected, unweighted graph without attributes
By setting directed = TRUE the edges in the graph are now directed towards some nodes. Some real-world examples are cash or trade flows, social media communication, or non-mutual friendship 😢
dir_net <- tbl_graph(nodes = node_list, edges = edge_list, directed = TRUE, node_key = "id")
ggraph(dir_net, layout = "kk") +
geom_edge_link(arrow = arrow(angle = 15, type = "closed", length = unit(4, "mm")),
end_cap = circle(4, "mm")) + # so arrows don't overlap nodes
geom_node_point(size = 10, colour = pal[1]) +
geom_node_text(aes(label = id),
colour = "white", vjust = 0.4) +
theme_graph()Directed, unweighted graph without attributes
We can also visualise the weight of the edges to show how ‘strong’ certain ties are, using the width argument.
# undirected, weighted, no attributes
ggraph(undir_net, layout = "kk") +
geom_edge_link(aes(width = weight),
alpha = 0.7) +
geom_node_point(size = 10, colour = pal[1]) +
geom_node_text(aes(label = id),
colour = "white", vjust = 0.4) +
theme_graph(base_family="sans") # provide font family, otherwise can't render document (Windows)Undirected, weighted graph without attributes
We are often interested in certain attributes of nodes or edges. To add some to the example data, tidygraph needs to know which data we want to alter (nodes or edges). Therefore, the package contains the activate function. Then, we can manipulate data with all the dplyr verbs we know and (mostly) love.
# undirected, unweighted, with attributes
undir_net.att <- undir_net %>%
activate(nodes) %>%
mutate(Preference = rep(c("Python", "R"), c(3, 2))) %>%
activate(edges) %>%
mutate(Relationship = sample(c("Friends", "Foes"), 8, replace = TRUE))
ggraph(undir_net.att, layout = "kk") +
geom_edge_link(aes(label = Relationship),
angle_calc = "along", label_dodge = unit(2.5, "mm"), label_push = unit(10, "mm"),
alpha = 0.7) +
geom_node_point(aes(colour = Preference),
size = 10) +
scale_color_manual(values = pal[c(1,2)]) +
geom_node_text(aes(label = id),
colour = "white", vjust = 0.4) +
theme_graph(base_family="sans") # provide font family, otherwise can't render document (Windows)Undirected, unweighted graph with attributes
The concept of attributes can be pushed further: we can say that nodes are of different types. This results in bipartite (or two-mode) networks, where nodes of the same type are not directly connected to one another, but only through the nodes of the other type. This could be employees in firms or authors of research papers, for example.
bipart_net <- play_bipartite(8, 2, p=0.8, directed = FALSE) %>% # play_* generates different types of networks
activate(nodes) %>%
mutate("Node.type" = as.character(if_else(type==TRUE, "Firm", "Employee")))
ggraph(bipart_net, layout = "stress") +
geom_edge_link() +
geom_node_point(aes(shape = Node.type , color = Node.type),
size = 6) +
scale_color_manual(values = pal[c(1,2)]) +
theme_graph(base_family="sans") # provide font family, otherwise can't render document (Windows)Bi-partite graph
As described above, we need a node and an edge set to do network analysis in R. However, life out there seldom provides us with data in this specific format. This is contrary to, e.g., survey data that is already in a ready-to-use format (apart from some variable recoding etc.).
Instead, networks are mostly displayed as different kinds of matrices. How these relate to one another can be confusing and is something we don’t usually have to deal with in standard quant research. There are three types of matrices that can be used to describe a network and ‘translating’ them into the desired format varies by type. Often your data isn’t even a matrix yet. In that case, you first have to figure out to which of the following formats you can/should transform it.
An adjacency matrix is basically a cross-table of the same elements (mostly of the nodes), and is therefore square. The cell values can be restricted to 0/1 to indicate whether there are any connections between the elements, or be a count of the connections.
adj_mat <- matrix(sample(0:3, 16, replace = TRUE), nrow = 4)
colnames(adj_mat) <- rownames(adj_mat) <- LETTERS[1:4]
adj_mat## A B C D
## A 1 1 3 0
## B 0 3 2 1
## C 3 1 2 1
## D 2 3 1 1
tidygraph can create tbl_graph objects from a variety of data formats, also adjacency matrices.
## # A tbl_graph: 4 nodes and 10 edges
## #
## # An undirected multigraph with 1 component
## #
## # Node Data: 4 x 1 (active)
## name
## <chr>
## 1 A
## 2 B
## 3 C
## 4 D
## #
## # Edge Data: 10 x 3
## from to weight
## <int> <int> <dbl>
## 1 1 1 1
## 2 1 2 1
## 3 1 3 3
## # ... with 7 more rows
An incidence matrix in contrast, is a cross-table of different elements, e.g. nodes and edges or different types of nodes as in a bipartite graph.
inc_mat <- matrix(sample(0:3, 12, replace = TRUE), nrow = 3)
rownames(inc_mat) <- LETTERS[1:3]
colnames(inc_mat) <- letters[1:4]
inc_mat## a b c d
## A 1 1 0 0
## B 3 3 3 3
## C 2 3 2 3
Converting the incidence matrix to a network results in this:
## # A tbl_graph: 7 nodes and 10 edges
## #
## # A bipartite simple graph with 1 component
## #
## # Node Data: 7 x 2 (active)
## type name
## <lgl> <chr>
## 1 FALSE A
## 2 FALSE B
## 3 FALSE C
## 4 TRUE a
## 5 TRUE b
## 6 TRUE c
## # ... with 1 more row
## #
## # Edge Data: 10 x 3
## from to weight
## <int> <int> <dbl>
## 1 1 4 1
## 2 1 5 1
## 3 2 4 3
## # ... with 7 more rows
The type variable is a logical that denotes the node-type in a bipartite network.
We already know this format and it’s just a matrix in disguise. It consists of two columns with labels/names of elements that are connected to one another, and sometimes a weight column. It’s actually just the edgelist as in the example network before, yay!
## # A tibble: 8 x 3
## # Groups: from [5]
## from to weight
## <dbl> <dbl> <int>
## 1 1 2 2
## 2 1 3 1
## 3 2 4 1
## 4 3 2 1
## 5 3 4 1
## 6 3 5 1
## 7 4 5 1
## 8 5 2 3
## # A tbl_graph: 5 nodes and 8 edges
## #
## # An undirected simple graph with 1 component
## #
## # Node Data: 5 x 1 (active)
## name
## <chr>
## 1 1
## 2 2
## 3 3
## 4 4
## 5 5
## #
## # Edge Data: 8 x 3
## from to weight
## <int> <int> <int>
## 1 1 2 2
## 2 1 3 1
## 3 2 4 1
## # ... with 5 more rows
An edgelist is sufficient to create a network, but often we have some additional data with node attributes like the R and Python users in the example.
Great intro using different packages, good example of how to create/manipulate node and edge sets:
https://www.jessesadler.com/post/network-analysis-with-r/
Intro to tidygraph by the creator Mr. Thomas Lin Pedersen himself:
https://www.data-imaginist.com/2017/introducing-tidygraph/ (reference manual: https://tidygraph.data-imaginist.com/reference/index.html)
Extensive overview of visualisation possibilities, also for interactive plots:
https://kateto.net/network-visualization
An application of tidygraph an ggraph with Game of Thrones data:
https://www.shirin-glander.de/2018/03/got_network/
Extensive ggraph intro:
http://mr.schochastics.net/netVizR.html